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Abstract. Sosemanuk is a new synchronous software-oriented stream 
cipher, corresponding to Profile 1 of the ECRYPT call for stream cipher 
primitives. Its key length is variable between 128 and 256 bits. It ac- 
commodates a 128-bit initial value. Any key length is claimed to achieve 
128-bit security. The Sosemanuk cipher uses both some basic design 
principles from the stream cipher SNOW 2.0 and some transformations 
derived from the block cipher SERPENT. Sosemanuk aims at improv- 
ing SNOW 2.0 both from the security and from the efficiency points of 
view. Most notably, it uses a faster IV-setup procedure. It also requires 
a reduced amount of static data, yielding better performance on several 
architectures. 



1 Introduction 

This paper presents a proposal for a new synchronous software-oriented 
stream cipher, named Sosemanuk. The Sosemanuk cipher uses both 
basic design principles from the stream cipher SNOW 2.0 [12] and trans- 
formations derived from the block cipher SERPENT [3]. For this reason, 
its name should refer both to SERPENT and SNOW. However, it is well- 
known that snow snakes do not exist since snakes either hibernate or 
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move to warmer climes during the winter. Instead Sosemanuk is a pop- 
ular sport played by the Eastern Canadian tribes. It consists in throwing 
a wooden stick along a snow bank as far as possible. Its name means 
snowsnake in the Cree language, since the stick looks like a snake in the 
snow. Kwakweco-cime win is a variant of the same game but does not 
sound like an appropriate cipher name. More details on the Sosemanuk 
game and a demonstration can be found in [19] and [24]. 

The Sosemanuk stream cipher is a new synchronous stream cipher 
dedicated to software applications. Its key length is variable between 128 
and 256 bits. Any key length is claimed to achieve 128-bit security. It is 
inspired by the design of SNOW 2.0 which is very elegant and achieves 
a very high throughput on a Pentium 4. Sosemanuk aims at improving 
SNOW 2.0 from two respects. First, it avoids some structural properties 
which may appear as potential weaknesses, even if the SNOW 2.0 cipher 
with a 128-bit key resists all known attacks. Second, efficiency is improved 
on several architectures by reducing the internal state size, thus allowing 
for a more direct mapping of data on the processor registers. Sosemanuk 
also requires a reduced amount of static data; this lower data cache pres- 
sure yields better performance on several architectures. Another strength 
of Sosemanuk is that its key setup procedure is based on a reduced 
version of the well-known block cipher SERPENT, improving classical 
initialization procedures both from an efficiency and a security point of 
view. 

2 Specification 

2.1 SERPENT and derivatives 

SERPENT [3] is a block cipher proposed as an AES candidate. SERPENT 
operates over blocks of 128 bits which are split into four 32-bit words, 
which are then combined in so-called "bitslice" mode. SERPENT can 
thus be defined as working over quartets of 32-bit words. We number 
SERPENT input and output quartets from to 3, and write them in 
the order: (I3, Y2, Y±, Yq). Yq is the least significant word, and contains 
the least significant bits of the 32 4-bit inputs to the SERPENT S-boxes. 
When SERPENT output is written into 16 bytes, the Yi values are written 
following the little-endian convention (least significant byte first), and Yq 
is output first, then Y\, and so on. 

From SERPENT, we define two primitives called Serpentl and Ser- 
pent24- 



Serpentl A SERPENT rounds consist of, in that order: 

— a subkey addition, by bitwise exclusive or; 

— S-box application (which is expressed as a set of bitwise combinations 
between the four running 32-bit words, in bitslice mode); 

— a linear bijective transformation (which amounts to a few XORs, shifts 
and rotations in bitslice mode), see Appendix A. 2. 

Serpentl is one round of SERPENT, without the key addition and the 
linear transformation. SERPENT uses eight distinct S-boxes (see A.l for 
details), numbered from Sq to S7 on 4-bit words. We define Serpentl as 
the application of S2, in bitslice mode. This is the third S-box layer of 
SERPENT. Serpentl takes four 32-bit words as input, and provides four 
32-bit words as output. 

Serpent24 Serpent24 is SERPENT reduced to 24 rounds, instead of the 
32 rounds of the full version of SERPENT. Serpent24 is equal to the first 
24 rounds of SERPENT, where the last round (the 24th) is a complete 
one and includes a complete round with the linear transformation and an 
XOR with the 25th subkey. In other words, the 24th round of Serpent24 
is thus equivalent to the thirty-second round of SERPENT, except that 
it contains the linear transformation and that the 24th and 25th subkeys 
are used (32nd and 33rd subkeys in SERPENT). Thus, the last round 
equation on Page 224 in [3] is 



Serpent24 uses only 25 128-bit subkeys, which are the first 25 subkeys 
produced by the SERPENT key schedule. In Sosemanuk, Serpent24 is 
used for the initialization step, only in encryption mode. Decryption is 
not used. 

2.2 The LFSR 

Underlying finite field Most of the stream cipher internal state is held 
in a LFSR containing 10 elements of F 2 32, the field with 2 32 elements. 
The elements of F 2 32 are represented exactly as in SNOW 2.0. We recall 
this representation here. Let F 2 denote the finite field with 2 elements. 
Let (3 be a root of the primitive polynomial: 




Q(X) = A 8 + A 7 + A 5 + A 3 + 1 



on F 2 [X]. We define the field F 2 8 as the quotient F 2 [X]/Q(X). Each 
element in F 2 s is represented using the basis (/3 7 , /3 6 , ...(3, 1). Since the 
chosen polynomial is primitive, then (3 is a multiplicative generator of all 
invertible elements of F 2 s : every non-zero element in F 2 s is equal to (3 k 
for some integer k (0 < k < 254). Any element in F 2 8 is identified with 
an 8-bit integer by the following bijection: 

<f>: F 2 s {0,1,-.., 255} 

where each Xi is either or 1. For instance, /3 23 is represented by the 
integer <^>(/? 23 ) = OxEl (in hexadecimal). Therefore, the addition of two 
elements in F 2 s corresponds to a bitwise XOR between the corresponding 
integer representations. The multiplication by (3 is a left shift by one bit 
of the integer representation, followed by an XOR with a fixed mask if 
the most significant bit dropped by the shift equals 1. 
Let a be a root of the primitive polynomial 

P(X) = A 4 + (3 23 X 3 + /? 245 X 2 + /3 48 A + (3 2m 

on F 2 s[X]. The field F 232 is then defined as the quotient ¥ 2 s[X]/P(X), 
i.e., its elements are represented with the basis (a 3 , a 2 , a, 1). Any element 
in F 2 32 is identified with a 32-bit integer by the following bijection: 

i/>: F 232 ^{0,1,...,2 32 - 1} 

Thus, the addition of two elements in F 2 32 corresponds to a bitwise XOR 
between their integer representations. This operation will hereafter be 
denoted by 0. Sosemanuk also uses multiplications and divisions of ele- 
ments in F 2 32 by a. Multiplication of z 6 F 2 32 by a corresponds to a left 
shift by 8 bits of ip(z), followed by an XOR with a 32-bit mask which 
depends only on the most significant byte of ip(z). Division of z S F 2 32 
by a is a right shift by 8 bits of i/j(z), followed by an XOR with a 32-bit 
mask which depends only on the least significant byte of ^{z). 

Definition of the LFSR The LFSR operates over elements of F 2 32 . The 
initial state, at t = 0, entails the ten 32-bit values s\ to s\q. At each step, 
a new value is computed, with the following recurrence: 

s i+ io = st+9 a~ 1 s t+3 as t , Vi > 1 

and the register is shifted (see Figure 1 for an illustration of the LFSR) . 
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Fig. 1. The LFSR 



The LFSR is associated with the following feedback polynomial: 

n(X) = aX 10 + a~ l X 7 + X + l£ F 232 [X] 

Since the LFSR is non-singular and since ir is a primitive polynomial, 
the sequence of 32-bit words (st)t>i is periodic and has maximal period 
(2 320 - 1). 



2.3 The Finite State Machine 

The Finite State Machine (FSM) is a component with 64 bits of memory, 
corresponding to two 32-bit registers Rl and R2. At each step, the FSM 
takes as inputs some words from the LFSR state; it updates the memory 
bits and produces a 32-bit output. The FSM operates on the LFSR state 
at time t > 1 as follows: 

FSM t : (Rl^ 1 ,R2 t ^ u s t+1 ,s t+8 ,s t+9 ) » (Rl t ,R2 t ,f t ) 

where 

Bit = (R2t-l + mux(lsb( J Rl t _i), s m , s t+1 s t+8 )) mod 2 32 (1) 
R2 t = Trans{Rl t -i) (2) 
ft = (s t+9 + Bit mod 2 32 ) B2 t (3) 

where lsb(x) is the least significant bit of x, mux(c, x, y) is equal to x if 
c = 0, or to y if c = 1. The internal transition function Trans on IF 2 32 is 
defined by 

Trans (z) = (M x z mod 2 32 )« <7 

where M is the constant value 0x54655307 (the hexadecimal expression 
of the first ten decimals of ir) and «< denotes bitwise rotation of a 32-bit 
value (by 7 bits here). 



2.4 Output transformation 



The outputs of the FSM are grouped by four, and Serpentl is applied to 
each group; the result is then combined by XOR with the corresponding 
dropped values from the LFSR, to produce the output values zt- 

(zt+3, z*+2, z t+ i,z t ) = Serpentl (f t+3 , f t+2 , f t +i, ft) © Ot+3, s t+2 , s t +i,s t ) 
Four consecutive rounds of Sosemanuk are depicted in Figure 2. 

2.5 Sosemanuk workflow 

The Sosemanuk cipher combines the FSM and the LFSR to produce the 
output values zt- Time t = designates the internal state after initializa- 
tion; the first output value is Z\. Figure 3 gives a graphical overview of 
Sosemanuk. 

At time t > 1, we perform the following operations: 

— The FSM is updated: Rlt, R2t and the intermediate value ft are 
computed from i?l t _i, R2 t -i, st+i, s t +s and st+9. 

— The LFSR is updated: st+10 is computed, from st, st+3 and st+9- The 
value St is sent to an internal buffer, and the LFSR is shifted. 

Once every four steps, four output values z t , Zt+i, zt+2 and zt+3 are 
produced from the accumulated values f t , ft+i, ft+2, ft+3 and s t , s t +i, s t +2, s t +3- 
Thus, Sosemanuk produces 32-bit values. We recommend encoding them 
into groups of four bytes using the little-endian convention, because it is 
faster on the most widely used high-end software platform (x86-compatible 
PC), and because SERPENT uses that convention. 

Therefore, the first four iterations of Sosemanuk are as follows. 

— The LFSR initial state contains values s% to sio; no value so is defined. 
The FSM initial state contains RIq and R2q. 

— During the first step, Rl\, R2\ and f\ are computed from -Rio, R2q, 
S2, sg and sio- 

— The first step produces the buffered intermediate values si and f\. 

— During the first step, the feedback word su is computed from sio, S4 
and s\, and the internal state of the LFSR is updated, leading to a 
new state composed of S2 to sn. 

— The first four output values are z%, Z2, Z3 and Z4, and are computed 
using one application of Serpentl over (f^, fy, f 2 , /1), whose output is 
combined by XORs with (54, S3, S2, si). 
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Fig. 2. The output transformation on four consecutive rounds of Sosemanuk. 




Fig. 3. An overview of Sosemanuk 



2.6 Key initialization and IV injection 

The Sosemanuk initialization process is split into two steps: 

— the key schedule, which processes the secret key but does not depend 
on the IV; and 

— the IV injection, which uses the output of the key schedule and the 
IV. This initializes the stream cipher internal state. 

Key schedule The key setup corresponds to the Serpent24 key schedule, 
which produces 25 128-bit subkeys, as 100 32-bit words. These 25 128-bit 
subkeys are identical to the first 25 128-bit subkeys produced by the plain 
SERPENT key schedule. 

SERPENT accepts any key length from 1 to 256 bits; hence, Sose- 
manuk may work with exactly the same keys. However, since Sose- 
manuk aims at 128-bit security; its key length must then be at least 
128 bits. Therefore, 128 bits is the standard key length. Any key length 
from 128 bits to 256 bits is supported. But, the security level still corre- 
sponds to 128-bit security. In other words, using a longer secret key does 
not guarantee to provide the security level usually expected from such a 
key. 

IV injection The IV is a 128-bit value. It is used as input to the Ser- 
pent24 block cipher, as initialized by the key schedule. Serpent24 consists 



of 24 rounds and the outputs of the 12th, 18th and 24th rounds are used. 
We denote those outputs as follows: 

- (Fg 12 ,^ 12 ,^ 12 ,^ 12 ): output of the 12th round; 

- ( Y" 3 18 , Y 2 18 , Y/ 8 , Y 18 ) : output of the 18th round; 

- (Yg 24 ,^ 24 ,^ 24 ,^ 24 ): output of the 24th round. 

The output of each round consists of the four 32-bit words just after 
the linear transformation, except for the 24th round, for which the output 
is taken just after the addition of the 25th subkey. 

These values are used to initialize the Sosemanuk internal state, with 
the following values: 
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3 Design rationale 

3.1 Key initialization and IV injection 

Underlying principle. A first property of the initialization process is that 
it is split into two distinct steps: the key schedule which does not depend 
on the IV, and the IV injection which generates the initial state of the 
generator from the IV and from the output of the key schedule. Then, 
the IV setup for a fixed key is less expensive than a complete key setup, 
improving the common design since changing the IV is more frequent 
than changing the secret key. 

A second characteristic of Sosemanuk is that the IV setup is derived 
from the application of a block cipher over the IV. If we consider the 
function Fk which maps a ra-bit IV to the first n bits of output stream 
generated from the key K and the IV, then Fk must be computationaly 
indistinguishable from a random function over FJg. Hence, the computa- 
tion of Fk cannot "morally" be faster than the best known PRF over n-bit 
blocks. It so happens that the fastest known PRF use the same implemen- 
tation techniques that the fastest known Pseudo-Random Permutations 
(which are block ciphers), and amount to the equivalent performance. 

Since Sosemanuk stream generation is very fast, the generation of n 
stream bits takes little time compared to a computation of a robust PRP 
over a block of n bits. Following this path of reasoning, we decided to use 



a block cipher as the fundation of the IV setup for Sosemanuk: the IV 
setup itself cannot be much faster than the application of a block cipher, 
and the security requirements for that step are much similar to what is 
expected from a block cipher. 

Choice of the block cipher. The block cipher used in the IV setup is 
derived from SERPENT for the following reasons: 

— SERPENT has been thoroughly analyzed during the AES selection 
process and its security is well-understood. 

— SERPENT needs no static data tables, and hence adds little or no 
data cache pressure. 

— The SERPENT round function is optimized for operation over data 
represented as 32-bit words, which is exactly how data is managed 
within Sosemanuk. Using SERPENT implies no tedious byte ex- 
traction from 32-bit words, or recombinations into such words. 

— We needed a block cipher for the key schedule and IV injection; using 
something other else than AES seems good for "biodiversity" . 

Design of Serpent24. The IV injection uses a reduced version of SER- 
PENT because SERPENT aimed at 256-bit security, whereas Sosemanuk 
is meant for 128-bit security. The best linear bias and differential bias for 
a 6-round version of SERPENT are 2" 28 and 2~ 58 respectively [3]. Thus, 
12 rounds should provide appropriate security. Twelve more rounds are 
added in order to generate enough data (three 128-bit words are needed 
for initializing Sosemanuk), hence 24 rounds for Serpent2J h We rely on 
the Sosemanuk core itself to provide some security margins (the output 
of Serpent24 is not available directly to the attacker). Two consecutive 
outputs of data are spaced with six inner rounds in order to prevent the 
existence of relations between the bits of the initial state and the secret 
key bits which could be used in an attack. 

3.2 LFSR 

The SNOW 2.0 LFSR contains 16 elements, which means 512 bits of 
internal state. Since we aim only at 128-bit security, we can accommodate 
a shorter LFSR. To defeat time-memory-data trade-off attacks, 256 bits 
of internal state at least should be used; we wanted some security margin, 
hence an LFSR length a bit more than six words. 



LFSR length. The LFSR length n must be as small as possible: the bigger 
the state, the more difficult it is to map the state values on the proces- 
sor registers. Ideally, the total state should fit in the 16 general-purpose 
registers that the new AMD64 architecture offers. 

For efficient LFSR implementation, the LFSR must not be physically 
shifted; moving data around contributes nothing to actual security, and 
takes time. If n is the LFSR length, then kn steps (for some integer k) 
must be "unrolled" , so that at each step only one LFSR cell is modified. 
Moreover, since Serpentl operates over four successive output values, kn 
corresponds to lcm(4, n) and it should be kept as small as possible, since 
a higher code size increases code cache pressure. 

These considerations led us to n = 8 or 10. But, an LFSR of length 
eight presents potential weaknesses which may be exploited in a guess- 
and-determine attack (see Section 4.3). Therefore, a LFSR of length 10 
is a suitable choice: the 384-bit internal state length should be enough; 
only 20 steps need to be unrolled for an efficient implementation. The 
total internal state fits in 12 registers, which should map fine on the new 
AMD64 architecture. 

Feedback polynomial. The design criteria for the feedback polynomial are 
similar to those used in SNOW 2.0. Since the feedback polynomial must 
be as sparse as possible, we chose as in SNOW 2.0 a primitive polynomial 
of the form 

tt(X) = c X w + c a X n ~ a + c b X n - b + 1 , 

where 0<a<6<10. The coefficients co,c a and q, preferably lie in 
{l,a, a -1 } which are the elements corresponding to an efficient multipli- 
cation in F 2 32. Moreover, {co,c a ,Q,} must contain at least two distinct 
non-binary elements; otherwise, a multiple of tt with binary coefficients 
can be easily constructed [11, 16], providing an equation which holds for 
each single bit position. 

We also want a and b to be coprime with the LFSR length. Otherwise, 
for instance if d = gcd(a, 10) > 1, the corresponding recurrence relation 

St+10 = C b S t+ b + C a S t+a + C S t 

involves three terms of a decimated sequence (sdt+i)t>o (for some inte- 
ger i), which can be generated by an LFSR of length n/d [23]. These 
conditions led us to a = 3 and b = 9. Since a and b are not coprime, c a 
and q, must be different; otherwise, some simplified relations may be ex- 
hibited by manipulating the feedback polynomial as shown in [16, 9]. The 



values Co = a, C3 = a and eg = 1 correspond to a suitable primitive 
polynomial that fulfills all previously mentioned conditions. 

3.3 FSM 

The Trans function. The Trans function is chosen according to the follow- 
ing implementation criteria: no static data tables in order to reduce the 
cache pressure and the function must be fast on modern processors. For 
these reasons, the Trans function is composed of a 32-bit multiplication 
and a bitwise rotation which are both very fast. The 32-bit multiplica- 
tion provides excellent "data mixing" compared to the number of clock 
cycles it consumes. The bitwise rotation avoids the existence of a linear 
relation between the least-significant bits of the inputs and the output of 
the FSM. 

The operations involved in the Trans functions are incompatible with 
the other operations used in the FSM (addition over Z 2 32 , XOR opera- 
tion). Actually mixing operations on the ring and on the vector space 
disables associativity and commutativity laws. For instance, 

(M x (i?2 4 _i + st+i mod 2 32 ) mod 2 32 ) <<<7 
[M x (i22 t _i) mod 2 32 ) <<<7 + (M x (s m ) mod 2 32 ) «<7 mod 2 . 

The mux operation. The mux operation aims at increasing the complex- 
ity of fast correlation and algebraic attacks, since it decimates the FSM 
input sequence in an irregular fashion. Moreover, this operation can be 
implemented efficiently with either control bit extension and bitwise op- 
erations, or an architecture specific "conditional move" opcode. Modern 
C compilers know how to perform those optimizations when compiling 
the C conditional ternary operator "? : " . This multiplexer is quite fast 
and requires no jump. 

It is fitting that both LFSR elements st+ c and st+d (with c < d) in the 
mux operation are not involved in the recurrence relation. Otherwise the 
complexity of guess-and-determine attacksmight be reduced. The distance 
(d — c) between those elements must be coprime with the LFSR length 
since they must not be expressed as a decimated sequence with a lower 
linear complexity. Here, we choose d—c = 7. Finally, it must be impossible 
for the inputs of the mux operation at two different steps correspond 
to the same element in the LFSR sequence. For this reason, the mux 
operation outputs either st+ c or sj +c © s t +d- If st+c © St+d is the input of 
the FSM at time i, the possible inputs at time (t + d — c) are st+d and 



St + d ffi s t+2d-ci which do not match any previous input. It is worth noticy 
that this property does not hold anymore if the mux outputs either st+ c 
or s t+d . 

3.4 The output transformation 

The output transformation derived from Serpentl aims at mixing four 
successive outputs of the FSM in a nonlinear way. As a consequence, any 
32-bit keystream word produced by Sosemanuk depends on four consec- 
utive intermediate values ft- As a result, recovering any single output of 
the FSM, ft, in a guess-and-determine attack requires the knowledge of at 
least four consecutive words from the LFSR sequence, st, st+i, s*+2, «t+3 
(see Section 4.3 for details). 

The following properties have also been taken into account in the 
choice of output transformation. 

— Both nonlinear mixing operations involved in SOSEMANUK (the Trans 
operation and the Serpentl used in bitslice mode) do not provide any 
correlation probability or linear property on the least significant bits 
that could be used to mount an attack (see Section 4.4 for further 
details). 

— From an algebraic point of view, those operations are combined to 
produce nonlinear equations (see Section 4.6). 

— No linear relation can be directly exploited on the least significant bit 
of the values (ft, ft+i, ft+2-, ft+z)-, onr y quadratic equations with more 
variables than the number of possible equations (see Section 4.4). 

— The linear relation between st and Serpentl (f t , ft+i, ft+2, ft+3) pre- 
vents Sosemanuk from SQUARE-like attacks. 

Finally, the fastest SERPENT S-box (S2) has been chosen in Serpentl 
from an efficiency point of view [22]. But, S2 also guarantees that there 
is no differential-linear relation on the least significant bit (the "most 
linear" one in the output of the FSM). 

4 Resistance against known attacks 

Our stream cipher Sosemanuk offers a 128-bit security, based on the 
following security model. 



4.1 Security model 



The attacker is a probabilistic Turing Machine with access to a black box 
(oracle) that accepts the following three instructions: Reset, Init with 
a 128-bit input, GetStream with a 1-bit output. The attacker's goal is 
to distinguish with probability 2/3 between a black box that generates 
random output, and a black box that implements the stream cipher, where 
Reset generates a random key, Init initializes the internal state of the 
stream cipher with a new chosen IV, and GetStream generates the next 
bit of keystream. The attacker is allowed to do 2 128 elementary operations, 
an instruction to the black box being an elementary operation. 

This security model falls under remarks made by Hong and Sarkar [18], 
because the precomputation time is not bounded by our model. Therefore 
our claim is that the 256-bit key variant of Sosemanuk provide a 128- 
bit security. We do not know of a formal security model that restricts 
the precomputation time, i.e. that only allows the attacker one of the 
probabilistic Turing machines that can be built in a reasonable time from 
the current content of today's computers. Therefore, our claim is that 
the 128-bit key variant of Sosemanuk, and all variants with larger keys, 
provide a 128-bit security against an attacker that is not allowed to benefit 
from large precomputation. 

The following sections focus on the security of Sosemanuk against 
known attacks. It is important to note that the secret key of the cipher 
cannot be easily recovered from the initial state of the generator. Once the 
initial state is recovered, the attacker is only able to generate the output 
sequence for a particular key and a given IV. Recovering the secret key 
or generating the output for a different IV additionally requires the cost 
of an attack on Serpent24 with a certain number of plaintext/ciphertext 
pairs. 

4.2 Time-memory-data tradeoff attacks 

Due to the choice of the length of the LFSR (more than twice the key 
length), the time-memory-data tradeoff attacks described in [2, 14, 5] are 
impracticable. Moreover, since these TMDTO attacks aim at recovering 
the internal state of the cipher, recovering the secret key requires the 
additional cost of an attack against Serpent24- The best time- memory 
data tradeoff attack is the Hellman's one [17] which aims at recovering a 
pair (K, IV). For a 128-bit secret key and a 128-bit IV, its time complexity 
is equal to 2 128 cipher operations (see [18] for further details). 



4.3 Guess and determine attacks 



The main weaknesses of SNOW 1.0 are related to this type of attacks (two 
at least have been exhibited [16], [9]). They essentially exploit a particular 
weakness in the linear recurrence equation. This does not hold anymore 
for the new polynomial choice in SNOW 2.0 and for the polynomial used 
in SOSEMANUK which involve non-binary multiplications by two different 
constants. The first attack [16] also exploited a "trick" coming from the 
dependence between the values Rlt-i and Rlt- This trick is avoided in 
SNOW 2.0 (because there is no direct link between those two register 
values anymore) and in SOSEMANUK. 

The best guess and determine attack we have found on SOSEMANUK 
is the following. 

— Guess at time t, st, st+i, st+2, $t+3, Rlt-i and R2t-\ (6 words). 

— Compute the corresponding outputs of the FSM (ft, ft+i, ft+2, ft+3)- 

— Compute R2t = Trans(Rlt-i) and Rlt from Equation (1) if lsb(i?lt_i) = 
1 (this can be done only with probability 1/2). 

— From ft = (sj + 9 + Rlt mod 2 32 ) R2 t , compute sj+g. 

— Compute Rlt+i from the knowledge of both st+2 and st+9; compute 
R2 t+1 . Compute s t+ w from f t+1 , Rl t +i and R2 t+1 . 

— Compute Rlt+2 from st+3 and st+io; compute R2 t +2- Compute st+n 
from ft+2, Rlt+2 and R2 t+ 2- Now, st+A can be recovered due to the 
feedback relation at time t + 1: 

a _1 st+4 = s t+ ii © st+io © as t+ i . 

— Compute Rlt+3 from st+4 and st+11; compute R2 t +2- Compute 54+12 
from ft+3, Rlt+3 and R2 t +z. Compute st+5 by the feedback relation 
at time t + 2: 

a~ 1 s t +5 = st+12 © st+ii © ast+2 ■ 

At this point, the LFSR words s t , s t +i, s t+2 , s t + 3 , s t + 4 , s t +5, s t +9 are 
known. Three elements (st+e, st+7, st+s) remain unknown. To complete 
the full 10 words state of the LFSR, we need to guess 2 more words, st+e 
and st+7 since each ft+i, 4 < i < 7, depends on all 4 words s t +i, Si+5, 
st+6 and st+7- Therefore, this attack requires the guess of 8 32-bit words, 
leading to a complexity of 2 256 . 

Note that in [1] and in [25] the authors respectively proposed two 
guess and determine attacks against Sosemanuk that have a complexity 



approximatively equal to 2 6 and 2 computations. However, as stated 
in paragraphs 2.6, 3.2 and 4.1, we never intended to have more than 
128-bit security. The internal state of Sosemanuk is 384-bit long, which 
would be bad practice if we aimed at 256-bit security. Therefore, those 
guess-and-determine attacks, while being interesting theoretical studies, 
do not compromise the security of Sosemanuk. 

4.4 Correlation attacks 

In order to find a relevant correlation in Sosemanuk, the following ques- 
tions can be addressed: 

— does there exist a linear relation at bit level between some input and 
output bits? 

— does there exist a particular relation between some input bit vector 
and some output bit vector? 

In the first case, two linear relations could be exhibited at the bit level. 
In the first, the least significant bit of St+g was "conserved", since the 
modular addition over Z 2 32 is a linear operation on the least significant 
bit. The second linear relation induced by the FSM concerns the least 
significant bit of st+i or of st+i © st+s (used to compute Rlt) or the 
seventh bit of R2 t computed from st or of st © st+7- We here use that 
R2 t = Trans (Rl t -i) and Rl t -i = R2 t - 2 + (s t or (s t © s t+7 )) mod 2 32 . 

No linear relation holds after applying Serpentl and there are too 
many unknown bits to exploit a relation on the outputs words due to the 
bitslice design. Moreover, a fast correlation attack seems to be imprac- 
ticable because the mux operation prevents certainty in the dependence 
between the LFSR states and the observed keystream. 

4.5 Distinguishing attacks 

A distinguishing attack by D. Coppersmith, S. Halevi and C. Jutla (see 
[10]) against the first version of SNOW used a particular weakness of the 
feedback polynomial built on a single multiplication by a. This property 
does not hold for the choice of the new polynomial in SNOW 2.0 and for 
the polynomial used in Sosemanuk where multiplication by a -1 is also 
included. 

In [26], D. Watanabe, A. Biryukov and C. De Canniere have mounted 
a new distinguishing attack on SNOW 2.0 with a complexity about 2 225 



operations using multiple linear masking method. They construct 3 dif- 
ferent masks J\ = r, i~2 = r ■ a and I3 = r ■ u~ l based on the same 
linear relation r. 

The linear property deduced from the masks Fi (i = 1, 2 or 3) must 
hold with a high probability on the both following quantities: 7$ • S"(x) = 
li • x and • z © 7$ • t = Ti ■ (z EE t) for £=1,2 and 3, where S' is the 
transition function of the FSM in SNOW 2.0. In the case of SNOW 2.0, 
the hardest hypothesis to satisfy is the first one defined on y = S'(x). In 
the case of Sosemanuk, we need Pr(i~i • Trans (x) = 7^ • x)i=i,2,3 to be 
high. But, we also need that V£ = 1,2,3, the relation 

(r/, r/, Tl, r() ■ {x x , x 2 , x 3 , x 4 ) = Serpentl ((il, r i? r», r») • (xi, x 2 , x 3 , x 4 )) . 

for some T- G F^ 2 , holds with a high probability. 

Due to the bitslice design chosen for Serpentl, it seems very difficult 
to find such a mask. Therefore, the attack described in [26] could not be 
applied directly on Sosemanuk. 

4.6 Algebraic attacks 

Let us consider, as in [4], the initial state of the LFSR at bit level: 

( s iCb ' ' ' i s i) = { s iqi ' ' ' ? s i0' ' ' ' 1 s ? 1 ' ' ' ' 1 s i) 
Then, the outputs of Sosemanuk at time t > 1 could be written: 

Ft ((sll, ■ ■ ■ , Sq)) = (zt,Zt+l,Zt+2,Zt+3) 

where F is a vectorial Boolean function from F^ 20 into F^ 28 that could be 
seen as 128 Boolean functions Fj, Vj G [0..127] from F^ 20 into F 2 . 

Let us study the degree of an Fj function depending on a particular 
bit of the output or on a linear combination of output bits because it is 
not possible to directly compute the algebraic immunity of each function 
Fj due to the very large number of variables (320 input bits). We think 
that the following remarks prevent the existence of low degree relations 
between the inputs and the outputs of Fj. 

— The output bit i after the modular addition on Z 2 32 is of degree £ + 1 
(as described in [6]). 

— The output bit £ after the Trans mapping is of degree £+1—7 mod 32, Mi 7^ 
6 and equal to 32 for £ = 6 (as described in [6]). 



— The mux operation does not enable to determine with probability one 
the exact number of bits of the initial state involved in the algebraic 
relation. 

— The algebraic immunity of the SERPENT S-box 5*2 at 4-bit word level 
is equal to 2 (see [21] for a definition of the algebraic immunity and 
more details). 

Under those remarks, we think that an algebraic attack against SOSE- 
MANUK is intractable. 

5 Implementation 

The reference C implementation is also an optimized implementation. 
When compiled with the SOSEMANUK_VECTOR macro defined, it is a full 
program (with its own main() function) which outputs two detailed test 
vectors. Since the LFSR length is ten, we unroll the C code on 20 rounds 
(see 3.2 for details); each test vector contains: 

— A copy of the secret key (a sequence of bytes, expressed in hexadeci- 
mal). 

— The expanded secret key, as described by the SERPENT specification: 
the key is expanded to 256 bits, then read as a 256-bit number with 
the little endian convention. The test vector outputs that key as a big 
hexadecimal number, with some digit grouping. 

— The 25 Serpent24 subkeys, each of them consisting of four 32-bit words 
(in the (K 3 ,K 2 ,K!,Ko) order). 

— The 128-bit IV, as a sequence of 16 bytes. 

— The IV, once transformed into four 32-bit words, in the (J3, 12, I\, Io) 
order. 

— The initial LFSR state (si to sio, in that order). 

— The initial FSM state (Rl and R2 ). 

— Ten times the following data: 

• Four times the following: 

* the new FSM state (Rl t and R2 t ); 

* the new LFSR state, after the update (the dropped value St is 
also output); 

* the intermediate output ft- 

• The Serpent 1 input. 

• The Serpent 1 output. 

• 16 bytes of Sosemanuk output. 

— The total stream output (160 bytes). 



6 Performance 



6.1 Software implementation 

This section is devoted to the software performance of Sosemanuk. It 
compares the performance of Sosemanuk with the other candidates se- 
lected in the Phase 3 (Software Profile), SNOW 2.0 and AES-CTR using 
the eSTREAM testing framework and the provided reference C imple- 
mentations [7]. The three tables Table 1, Table 2 and Table 3 sum up the 
results (for the keystream generation, the IV setup and the key setup) 
given in [8] for three different architectures: an Intel Pentium 4 (CISC 
target), an AMD Athlon64 X2 4200+ (CISC target) and an Alpha EV6 
(RISC target). 

All the results presented for Sosemanuk have been computed using 
the supplied reference C implementation. 

Code size. The main unrolled loop implies a code size between 2 and 5 
KB depending on the platform and the compiler. Therefore, the entire 
code fits in the LI cache. 

Static data. The reference C implementation uses static data tables with 
a total size equal to 4 KB. This amount is 3 times smaller than the size of 
static data required in SNOW 2.0, leading to a lower date cache pressure. 

Key setup. We recall that the key setup (the subkey generation given by 
Serpent24) is made once and that each new IV injection for a given key 
corresponds to a small version of the block cipher SERPENT. 

The performance of the key setup and of the IV setup in Sosemanuk 
are directly derived from the performance of SERPENT [13]. Due to 
intellectual property aspects, our reference implementation does not re- 
use the best implementation of SERPENT. However, the performance 
given in [20] (i.e., computed on the Gladman's code written in assembly 
language [13]) leads to the following results on a Pentium 4: 

— key setup ~ 900 cycles; 

- IV setup ~ 480 cycles. 

These estimations for the IV setup (resp. key setup) performance corre- 
sponds to about 3/4 of the best published performance for SERPENT 
encryption (resp. for SERPENT key schedule). 



Performance results. Table 1, Table 2 and Table 3 present the perfor- 
mance of the keystream generation (using four performance measures), 
the agility, the IV setup and the key setup to test the most relevant imple- 
mentation properties. The four elementary tests for keystream generation 
are: the encryption rate for long streams by ciphering a long stream in 
chunks of about 4Kb; the packet encryption rate for three packet lengths 
(40, 576 and 1500 bytes) including an IV setup; the agility test initiates 
a large number of sessions (filling 16MB of RAM), and then encrypts 
streams of plaintexts in short blocks of around 256 bytes, each time jump- 
ing from one session to another. 









cycles/byte 


cycles/key 


cycles/IV 


Algo. 


Key 


IV 


Stream 


40 bytes 


576 bytes 


1500 bytes 


agility 


Key setup 


IV setup 


AES CTR 


128 


128 


17.81 


29.19 


18.35 


18.04 


20.77 


393.45 


76.16 


SNOW v2.0 


128 


128 


5.04 


35.60 


6.92 


5.92 


7.95 


85.44 


1000.54 


CryptMT (v3) 


128 


128 


5.27 


39.12 


12.09 


11.55 


11.35 


53.71 


849.25 


DRAGON 


128 


128 


11.37 


74.09 


26.07 


23.23 


15.00 


256.04 


1925.54 


HC-128 


128 


128 


3.76 


1458.58 


104.86 


42.64 


19.02 


78.81 


56929.45 


HC-256 


128 


128 


4.39 


2596.20 


184.25 


73.59 


26.27 


76.66 


104341.33 


LEXvl 


128 


128 


9.46 


20.78 


10.88 


10.01 


12.30 


486.57 


449.00 


NLSv2 


128 


128 


6.64 


38.94 


8.52 


6.97 


12.10 


823.74 


704.68 


Rabbit 


128 


64 


9.46 


34.45 


11.77 


10.76 


12.89 


984.27 


825.55 


Salsa20 


128 


64 


16.61 


42.21 


17.63 


18.57 


18.71 


90.32 


78.19 


SOSEMANUK 


128 


64 


5.81 


52.37 


12.52 


9.62 


7.40 


1287.55 


1245.71 



Table 1. Number of CPU cycles for the stream ciphers using a Pentium 4 at 2.80GHz, 
Model 15/2/9 



As shown in these tables, Sosemanuk remains among the fastest 
algorithms on several platforms due to a good design for the mappings of 
data on the processor registers and a low data cache pressure. 

6.2 Hardware implementation 

In [15], the authors propose hardware implementations and performance 
metrics for several stream cipher candidates and especially Sosemanuk. 
They remark that even if the design of Sosemanuk is a little bit complex 
to implement, it leads to an impressive performance. The required number 
of gates for designing Sosemanuk on 0.13 pm. Standard Cell CMOS with 
a key of length 256 bits is 18819 considering that 32 bits are outputted at 
each cycle. Moreover, the corresponding leakage power is 33.55 for a 









cycles/byte 


cycles/key 


cycles/IV 


Algo. 


Key 


IV 


Stream 


40 bytes 


576 bytes 


1500 bytes 


agility 


Key setup 


IV setup 


AES CTR 


128 


128 


13.39 


18.09 


13.39 


13.35 


15.03 


152.81 


15.58 


SNOW v2.0 


128 


128 


4.83 


23.18 


5.77 


5.34 


6.46 


43.37 


528.04 


CryptMT (v3) 


128 


128 


4.65 


19.26 


8.47 


7.64 


8.82 


25.47 


384.33 


DRAGON 


128 


128 


7.76 


60.20 


25.90 


24.31 


10.01 


89.90 


1449.74 


HC-128 


128 


128 


2.86 


587.00 


43.19 


18.43 


13.07 


37.85 


23308.78 


HC-256 


128 


128 


4.72 


1420.99 


103.10 


42.83 


21.13 


41.31 


56725.89 


LEXvl 


128 


128 


6.84 


14.19 


7.78 


7.20 


9.19 


226.41 


268.31 


NLSv2 


128 


128 


10.69 


53.24 


13.45 


11.48 


14.13 


453.35 


1293.15 


Rabbit 


128 


64 


4.98 


14.60 


5.55 


5.25 


6.34 


288.21 


292.38 


Salsa20 


128 


64 


7.64 


16.10 


7.74 


7.91 


8.93 


24.57 


14.29 


SOSEMANUK 


128 


64 


4.07 


25.26 


7.20 


6.10 


5.12 


759.06 


560.63 



Table 2. Number of CPU cycles for the stream ciphers using an AMD Athlon 64 X2 
4200+ at 2.20GHz, Model 15/75/2 









cycles/byte 


cycles/key 


cycles/IV 


Algo. 


Key 


IV 


Stream 


40 bytes 


576 bytes 


1500 bytes 


agility 


Key setup 


IV setup 


AES CTR 


128 


128 


15.53 


24.63 


15.94 


15.82 


17.80 


633.65 


37.58 


SNOW v2.0 


128 


128 


5.17 


23.74 


6.11 


5.73 


6.37 


69.00 


489.35 


CryptMT (v3) 


128 


128 


6.90 


24.74 


11.64 


11.75 


12.86 


37.49 


422.17 


DRAGON 


128 


128 


8.46 


74.94 


41.89 


40.52 


10.13 


234.33 


1542.46 


HC-128 


128 


128 


3.90 


1029.93 


77.41 


31.59 


14.80 


54.67 


42130.00 


HC-256 


128 


128 


5.18 


2414.77 


171.48 


69.34 


23.53 


52.96 


95937.00 


LEXvl 


128 


128 


7.99 


16.87 


9.15 


8.44 


9.53 


198.49 


334.58 


NLSv2 


128 


128 


5.93 


24.26 


6.44 


5.59 


7.94 


530.39 


421.66 


Rabbit 


128 


64 


5.27 


14.49 


5.69 


5.53 


6.32 


318.57 


280.63 


Salsa20 


128 


64 


13.61 


39.93 


13.77 


14.34 


14.46 


33.60 


20.16 


SOSEMANUK 


128 


64 


4.63 


28.80 


7.66 


6.26 


5.32 


1301.09 


692.71 



Table 3. Number of CPU cycles for the stream ciphers using an Alpha EV6 at 500MHz, 
Model 21264 



total power at 10MHz equal to 812.47 (iW. The authors also derive the 
metrics for maximum clock frequency and for an output rate at 10 Mbps 
(estimated typical future wireless LAN). In this last case, the correspond- 
ing clock frequency is equal to 0.313 MHz for a Power-Area-Time equal 
to 564.8 nJ-um2. In conclusion, they recommend Sosemanuk for WLAN 
applications with a key length equal to 256 bits. They say that "with 
regard to Sosemanuk, the utility as a hardware cipher is clear thus in 
our opinion requires adding to the hardware focus profile." 



7 Strengths and advantages of Sosemanuk 

The new synchronous stream cipher Sosemanuk based upon the SNOW 
2.0 design improves it from several points of view. Prom a security point of 
view, Sosemanuk avoids some potential weaknesses as the distinguishing 
attack proposed in [26] due to the particular use of Serpentl in bitslice 
mode. The chosen LFSR is designed to eliminate all potential weaknesses 
(particular decimation properties, linear relations,...). The mappings used 
in the Finite State Machine have been carefully designed in the following 
way: 

— The Trans function guarantees good properties of confusion and dif- 
fusion for a low cost in software. Moreover, this mapping prevents 
Sosemanuk from algebraic attacks. 

— The mux operation, that could be efficiently implemented, protects 
Sosemanuk from fast correlation attacks and algebraic attacks. 

The Serpentl output transformation, very efficient in bitslice mode, 
provides nonlinear equations, a good diffusion and it improves the resis- 
tance to guess-and-determine attacks. 

The new design chosen for the key setup and the IV injection allows 
to split the initialization procedure into two distinct parts, without any 
loss of security. It leads to a much faster resynchronization mechanism. 

From an efficiency point of view, due to a reduced amount of static 
data and a reduced internal state size, the exploitation of the processor 
registers is enhanced and the data cache pressure is improved on several 
platforms, especially on RISC architectures. 
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A Specifications of SERPENT 

In this appendix, a recall on the specifications of SERPENT given in [3] 
is made. First, the S-boxes definition is given and the linear part is also 
defined again. 

A.l S-boxes definitions 

The eight SERPENT S-boxes act on 4-bit words and are defined as per- 
mutations of Zi@: 

50 : 3, 8, 15, 1, 10, 6, 5, 11, 14, 13, 4, 2, 7, 0, 9, 12 

51 : 15, 12, 2, 7, 9, 0, 5, 10, 1, 11, 14, 8, 6, 13, 3, 4 

52 : 8, 6, 7, 9, 3, 12, 10, 15, 13, 1, 14, 4, 0, 11, 5, 2 

53 : 0, 15, 11, 8, 12, 9, 6, 3, 13, 1, 2, 4, 10, 7, 5, 14 

54 : 1, 15, 8, 3, 12, 0, 11, 6, 2, 5, 4, 10, 9, 14, 7, 13 

55 : 15, 5, 2, 11, 4, 10, 9, 12, 0, 3, 14, 8, 13, 6, 7, 1 

56 : 7, 2, 12, 5, 8, 4, 6, 11, 14, 9, 1, 15, 13, 3, 10, 

57 : 1, 13, 15, 0, 14, 8, 2, 11, 7, 4, 12, 10, 9, 3, 5, 6 

A. 2 Linear part of SERPENT round function 

The linear part of a one round version of SERPENT acts on 4 32-bit words 
(X3, X2, X%, Xq) where X$ is the least significant word and is defined as 
follows: 



X = X «< 13 
X 2 = X 2 «< 3 



Xi = Xi © x © x 2 

X 3 = X 3 (B X 2 (B (X «<3) 
X x = X 1 «< 1 
X 3 = X 3 «<7 

x = x © Xi © x 3 

X 2 = X 2 ©X 3 e(Xi «<7) 
X = X «< 5 
X 2 = X 2 «< 22 



